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Abstract 

A goal of low-level neural processes is to build an efficient code extracting the 
relevant information from the sensory input. It is believed that this is implemented 
in cortical areas by elementary inferential computations dynamically extracting 
the most likely parameters corresponding to the sensory signal. We explore here 
a neuro-mimetic feed-forward model of the primary visual area (VI) solving this 
problem in the case where the signal may be described by a robust linear gen- 
erative model. This model uses an over-complete dictionary of primitives which 
provides a distributed probabilistic representation of input features. Relying on an 
efficiency criterion, we derive an algorithm as an approximate solution which uses 
incremental greedy inference processes. This algorithm is similar to 'Matching 
Pursuit' and mimics the parallel architecture of neural computations. We propose 
here a simple implementation using a network of spiking integrate-and-fire neu- 
rons which communicate using lateral interactions. Numerical simulations show 
that this Sparse Spike Coding strategy provides an efficient model for representing 
visual data from a set of natural images. Even though it is simplistic, this trans- 
formation of spatial data into a spatio-temporal pattern of binary events provides 
an accurate description of some complex neural patterns observed in the spiking 
activity of biological neural networks. 

Keywords: Neuronal representation, inverse linear model, over-complete dictionar- 
ies, distributed probabilistic representation, spike-event computation. Matching Pur- 
suit, Sparse Spike Coding. 

1 Toward a functional model of the neural code 

A major problem in neuroscience is to understand the content of the activity that is 
observed in biological neurons. These complex activity patterns that are the basis of 
our cognitive abilities remain a mystery and there is yet no known unifying model 
explaining the "language" that could be used by neurons at the various scales of the 
central nervous system. In particular, descriptive models of the neural activity tend to 
be incomplete or to reflect a distorted description of natural conditions 1 18]. We will try 
here to overcome these problems by precisely defining the model and the hypotheses 
that we want to validate. We will assume here that there exists a functional neural code 
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and that we may decipher the neural activity by exploring algorithms — based on the 
nature and architecture of the neural system — that solve efficiently the function that is 
provided by the system. We will illustrate this method for the primary visual area (VI) 
in the human by trying to define precisely its function and then by proposing a model 
for the neuronal representation and for the mechanisms that may implement it. 

1.1 Solving inverse problems using neural networks 

VI is a cortical area specialized in low-level visual processing from which the majority 
of the visual information diverges to higher visual areas. We will describe it here as im- 
plementing an inverse problem by analyzing images thanks to an internal model. The 
hypothesized function over the long term (in the order of hours to years) will thus be to 
process natural scenes (that is images that occur frequently) so as to progressively build 
a "model" of their structure. The goal is that for any of these images, this model must 
rapidly (in the order of a fraction of a second) represent a set of features relevant to that 
imagqjand corresponding to this model (see Fig.[T]i. This representation, including for 
instance the location and orientation of the edges that outline the shape of an object, is 
then relayed to higher level areas to allow, for instance, a robust recognition of useful 
patterns. Actually, this is similar to numerous tasks in engineering and applied math- 
ematics, where a reverse-engineering process allows to find a robust representation of 
the data (such as an estimation of the internal state of a system in control theory) by 
identifying the so-called hidden parameters of the system. The success of this algo- 
rithm over the long term (in the order of days to generations) allows then to validate 
the model that was learned through the pressure of evolution. In this framework, it is 
thus easier to describe cortical activity as the result of the inversion (or analysis) of an 
internal model of the world. 

Moreover, such a model of the world should also take into account some basic knowl- 
edge of actual physical interactions. This idea is based on the assumption that the obser- 
vations are the effect of the interplay between different causes corresponding to stable 
physical interactions and that they should be recovered to describe the observed data by 
representing the underlying actual physical structure. In particular, some knowledge of 
the usual transforms of the signal (such as translation and scaling in images) which are 
related to regularly occurring changes in the physical world (lateral and frontal trans- 
lations of objects in space) allows then for a robust representation and further analysis 
by higher level cortical areas. This may finally allow for desired properties such as an 
invariant representation of objects to these frequently occurring transformations. 
We will restrict here this artificial neural network to a feed-forward model of V 1 which 
processes flashed static imagefl We will assume that the model of natural images is 
fixed and accurate and we will define the goal of our model as recovering the sources 
(corresponding to some hidden state variables, see Fig.[T]i from an observed static im- 
age. Moreover, in the framework of natural living systems, we will assume that a main 
constrain from evolution is the ability to process the information as quickly as possible. 
This model will consist in these restrictive conditions to a one-layered neural network 
as illustrated in Fig. |2]and the output of the neural layer should describe at best and as 

'We will consider here that each neuron may be characterized by a prefen'ed pattern to represent. It 
should though be emphasized that this view differs from the "grand-mother neuron" paradigm since the 
representation emerges from the interaction of different active neurons. 

^In particular, we will study the transient response of the network and neglect the information fed back by 
higher areas. This latter information will be necessary in more complex algorithms which take into account 
the context of a local feature. 
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Figure 1 : Inverse-mapping as a a goal for sensory neural coding. The visible world 
is modeled as the interaction of a large set of hypothetical physical sources (world 
model) according to a known model of their interactions ("synthesis"). We will con- 
sider that for sensory cortical areas, the goal of the neural representation (and its im- 
plementation by the neural code) is to analyze the signal so as to recover at best and 
as quickly as possible the sources that generated the signal ("analysis") . The analysis 
may thus be considered as an inverse mapping of the synthesis. A proposed solution 
for this problem is to infer at best the most probable hidden state. 

quickly as possible the visual content. Considering the system as an information chan- 
nel (according to the definition of Shannon and Weaver |27]) which processes samples 
from the set of natural images, we may therefore define the goal of VI as to transmit 
the information about the sources (in bits) at the highest rate as possible. 

1.2 Inverse models for sensory processing 

To build an algorithm of the inverse model to efficiently code the input, we will first 
define the forward synthesis model as a Linear Generative Model (LGM) as is often 
assumed for natural images IitIi . For visual data, images consist indeed of the set of 
observed luminance values from different spatial positions and a fairly good approxi- 
mation — especially for small images of non-occluding objects — considers the image 
as the linear combination of "primitive images", similarly to the superposition of trans- 
parent layers. This approximation is based on the assumptions that the energy of the 
photonic flow from a spatial position (the luminosity) consists of the multiplicative 
interaction of different "shapes" that contribute each for a fraction of the global lumi- 
nosity. Thanks to the non-linear gamma transform of luminosities into luminances |22l 
which approaches a logarithmic function, these "shapes" add up linearly in the lumi- 
nance space. Although this is justified in practice for transparent shapes, it is not for 
occlusions. The LGM framework provides however a general framework for describ- 
ing natural images. 

The forward model defines images as the superposition of shapes of different intensi- 
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ties which correspond in our framework to scalar "hidden states". Formally, we will 
describe this set of scalars by a vectoJI s = {sj}i<j<N where N is the dimension 
of the dictionary. Similarly, one image will be described as a point in a multidimen- 
sional state space of dimension M where every pixel corresponds to one dimension 
(and therefore the pixel value will be its scalar value along this dimension). This obser- 
vation signal will be written x = {xi}i<i<M over the set of spatial positions denoted 
by their address i (that is the pixels over a rectangular grid in an image processing 
framework). To define the LGM, we will use a "dictionary" of images as the matrix 
A= {Aj;l <j< A^} of the images of the "primitive shapes" Aj = {j4y}i<i<M- 
The image corresponding to the internal state s will finally be defined as: 

X = Sj.Aj (1) 

This model of natural images is defined by the statistics of the sources S and by the 
dictionary A of primitive images. The latter corresponds to the set of basis functions 
which describe the space of all observed natural images I ~ {x} that we wish to char- 
acterize. 

In this paper, we will use the same fixed dictionary of filters (that is A) and assume sim- 
ilar hypotheses on the statistics of S to rate the efficiency of different coding strategies. 
Using this formalization, the function of the neural network consists in recovering the 
sources by inverting the synthesis process. The results of this inversion (in the space 
of the neural representation) will thus share the same dimension (that we noted N) as 
the space of the sources, that is the cardinal of the dictionary. As a first approximation 
(and as is observed in simple cells from VI), the dictionary of primitive shapes will 
correspond to localized orientation selective edges at different positions and scales re- 
sembling Gabor functions lfTlll23ll at different spatial scales. This may be particularly 
adapted in an information theoretic based framework as these shapes correspond to in- 
dependent features in natural scenes jjl. We choose here that the forward model will 
be described by a wavelet transform lll4ll and we will use this architecture to compare 
different coding strategies. 

1.3 Efficient coding of natural images 

In fact, particular care should be put on the parameters of this wavelet architecture. In 
particular, it is desirable for the representations of natural images to be robust to nat- 
ural conditions. As is the case for natural images, we will consider that the observed 
signals are generated by sources that share certain features which differ by continuous 
transformations such as edges at different time, position, orientation or scale. Since 
the corresponding spatial transformations (translations, rotations and scaling) are very 
common, if there exists a corresponding transformation in the source space (that is if 
this transformation of all sources are in the dictionary), the resulting representation of 
the transformed image should simply be derived by a transformation (in the source 
space) of the original representation. Thus, it is necessary for the dictionary to be 
invariant according to these usual transformations for the representation to be robust. 
In particular, this allows for instance for higher level areas to detect some specific in- 
puts with an invariance to usual transformations. Typically, this robustness constraint 
implies in our architecture that the tiling of the wavelet filters is smoother than an or- 
thogonal representation [21]. As a consequence, the dictionary will be over-complete, 

^in the following, we will denote vectors and matrices by bold characters 
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i.e. the number of dictionary elements will be of several orders of magnitude larger 
than the dimension of the image space (that is iV >> M). 

From the definition of the forward model, for any signal x, there exist at least one set 
of parameters s which recovers the observed signal. However, in the case where the 
dictionary is over-complete, the inversion of the LGM will not yield an unique solution 
in S to any given signal in T : the problem is ill-posed. The coding strategies corre- 
sponding to possible 'analysis' algorithms (see Fig.[T]i have different efficiencies and, 
in particular, the solution given by the wavelet coefficients with an over-complete dic- 
tionary yields an highly redundant representation. According to Barlow |i3J, the goal of 
sensory processing would be rather to choose the most efficient representation: follow- 
ing the same argument as the Occam razor, whenever there is the choice between two 
representations, the best is the one that is the most parsimonious. In our framework, a 
possible goal would be to maximize the mean codeword length, that is get the coding 
strategy that describes at best the images. From Shannon's source coding theorem |27|], 
this length is bounded by the entropy of the images for a given architecture and coding 
strategy. Under some assumptions that we will develop later, this is equivalent to find 
the sparsest representation, that is the representation that uses the smallest number of 
sources iHtIi . This sparseness constraint thus allows to restrict the different solutions of 
the inversion of the forward model so as to find an appropriate candidate for the neural 
code. 

However, the combinatorial complexity of the inverse problem grows very quickly as 
the dimension of the dictionary increases (it's NP-complete, see lll4ll ). There exists 
therefore no simple algorithm that optimizes exactly the problem in reasonable time as 
we handle more complex signals such as natural images, but acceptable sub-optimal 
strategies to approach this problem do exist (see a review in ifigll '). Most popular so- 
lutions optimize a compromise between the reconstruction error and the sparsity and 
are based on linear optimization or gradient approaches [28]. Following the same ar- 
guments as Barlow yl], we explore an alternate solution which uses a probabilistic 
representation and Bayesian inference. 



2 Sparse spike coding using a greedy inference pursuit 

Focusing on the event-based nature of axonal information transduction and in order to 
reflect the parallel architecture of the nervous system, we will here propose a solution 
for inverting the forward model that we defined for natural images. This will build 
a Bayesian inference framework based on feature-matching neurons and on spikes as 
events representing primitive "decisions". 

2.1 Greedy inference pursuit using spikes 

This approach proposes an alternative to classical paradigms of neural coding such as 
the spike-rate coding approach of the perceptron (see Fig.|2]l. Instead of coding infor- 
mation in the mean firing frequency of neurons, we will present an original approach 
solving the function that we defined above. It uses a distributed probabilistic repre- 
sentation and we will assume here that the activity of neurons (such as the membrane 
potential) in the layer will represent dynamically the evidence of a correct match and 
that the output spiking signal signifies a set of elementary decisions made by the neu- 
rons. Following this process and focusing on every single spike, the process occurs 
repeatedly using two steps: Matching (M) and Pursuit (P). 
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(M) To each neuron is assigned a vector (or weight pattern) corresponding to its pre- 
ferred stimulus. Neurons compete in parallel to find the most probable single 
source component by integrating evidence according to their weight patterns. 
The first source to be detected should be the one corresponding to the highest 
activity. 

(P) The best match is assigned a decision which, once it has been taken, is assumed 
to be reliable: we can take into account this information before performing any 
further computations (and in particular finding a new match) so as to yield a new 
representation where we removed completely the detected source. 

We call this approach a greedy pursuit which is based on the recursion of two greedy 
mechanisms (detection - removal). These are here idealized but correspond to known 
aspects of neural activity (matching - suppression). 

We will see that this method is similar to the approach developed in the method of 
Matching Pursuit 1 15]. However, instead of a heuristic scheme, the algorithm will here 
be derived from known hypotheses and thanks to the description of the successive steps 
that may lead to the greedy pursuit, it may be considered as an optimization strategy 
of the goal that we defined above (namely maximizing the transfer of information). 
We will then propose an implementation using Integrate-and-fire neurons and test the 
efficiency of this artificial neural code. 

2.1.1 Matching: Detection of tlie most probable source component 

First, given the signal x G X, we are searching for the single source s*.Aj* £ 2 
that corresponds to the maximum a posteriori (MAP) realization for x (and knowing 
it is a realization of the LGM as it is defined in Eq. [T]i. We will address in general a 
single source by its index and strength by {j, s} so that the corresponding vector in S 
corresponds to a vector of zero values except for the value s at index j. The MAP is 
defined by: 

{f,s*} - ArgMaX{^. ,jP(a s}|x) (2) 

To evaluate P{{j, s}|x), the probability a posteriori of a single source knowing the 
signal, we have from Bayes' theorem 

{r,5*} = ArgMaX{^.,j[P(x|as}).P({j,s})] (3) 

where P(x|{j, s}) is the likelihood probability of a signal knowing a single source and 
P{{j, s}) is the a priori probability of the sources. 

To compute the likelihood we have to first define the model of the measurement ifisl 
p. 26]. We will first assume that we are in a low-noise limit environment (the global 
contrast is optimal and the eye/camera is adapted to the scene) so that we have no or 
little measurement noise. Knowing one component {j, s}, the only "noise" from the 
viewpoint of neuron j is the combination of the unknown sources {Q!fc}i<j<A'. It is 
thus the residual of the signal knowing {j, s}. We may thus write the noise as 

X = s.Aj + with = afe.Afc (4) 

' ^ k 

The residual of the signal (an image) is thus considered as an undetermined perturba- 
tioifi Assuming that the ak are independent random variables (since we know only 

''it should be stressed that the image model is still deterministic. 
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{j, s}), from the central limit theorem it comes that for a sufficiently high number of 
sources, the distribution of the random variable i/ converges to a normal distribution 
with known mean and co variance matrix. From the work of Atick 1 1], we know that 
for natural images this normal distribution is fairly homogeneous across natural images. 
We may either use another metric (based on the Mahalanobis distance, as exposed in 



illn ) or use a decorrelating kernel to yield a spherical probability distribution centered 
around the origin {E{v) = 0) of this "noise". Normalizing by the mean energy of im- 
ages in I, the residual signal is thus considered as a decorrelated noise of unit variance. 
From P{^^\{j, s}) — P(x — s.Aj) = P{v), it follows 

{r,s*} = ArgMaX{^.,j[logP(x|{j- s}) + logP({j,s})] 

= ArgMin{^.,j[||x-s.A,||V2-logP(0-,s})] (5) 

We will further consider that the dictionary was learned so that over a long period the 
neurons have similar statistics: the prior is uniform across sources and values. We thus 
have no prior knowledge or preference for any source. It thus comes 

|2 



{j*,s*} = ArgMin{^.,j||x-s.Aj 



ArgMinr, ,Js2.||Ajf - 2.s. < x, A, >] 



To minimize this bi-variate function, we may first minimize for every element j the 
coefficient Sj to get the corresponding s* = ArgMaXj,P({j, s}|x). From the above 
equations, this is equivalent to minimizing in the last equation the quadratic function 
of s which is minimal for the scalar coefficient 

_ <x,Aj > 
'~ l|A,|P 

that is for the scalar projection of the input on Aj. Then, since for every element j, 
s*.Aj is the projection of x on Aj, so that s*.Aj and x — s*.Aj are orthogonal, it 
follows from Pythagoras's theorem 

r = ArgMin^.[||x-4.A,f] 

= ArgMin^.[||xf -||li|^.A,f]=ArgMax^.||l|^f 

j* = ArgMaXj.| <x,^j^ > I (7) 

Finally, as defined in Eq. |2] we found that the source component that maximizes the 
probability is the projection of the signal on the normalized elements of the dictionary. 



This justifies the computation of the correlation in the perceptron model ||25[] as it pro- 
vides a measure of the log-probability under the assumptions that we used. However, 
using a different strategy as these linear systems, we will associate in our greedy ap- 
proach this inference with a lateral propagation of this information to the correlated 
neurons and only then resume the algorithm. 



2.1.2 Pursuit: Lateral interaction and Greedy pursuit of the best components 

Before detecting another single source component, we will take into account the infor- 
mation that we extracted from the signal by propagating it to the neighboring neurons 
using lateral interaction links. As we found the MAP source knowing the signal x, we 



7 



may pursue the algorithm by accounting for this inference on the signal knowing the 
element that we found. From 



P{{], 5}|x, {]\s*}) = P{{j, s}|x - s*.A,, ) (8) 

and since source are here supposed to have independent activitiefl the pursuit algo- 
rithm assumes that — ^knowing the previous detection — we may resume the detection 
on this residual signal. We will thus use this new residual signal in which we will then 
find a new component corresponding to the most probable single source. 
In this recursive approach, we will note as n the rank of the step in the pursuit (which 
begins at n = for the initialization). Writing A'^ = the first scalar pro- 

jection that we have to maximize — and which will serve as the initiaUzation of the 
algorithm — is given by : 

Cf.<x,^> ,9, 

Let's also note the address of the successive winning neuron from the first step n ~ 1 
as 

-,•(") =ArgMax^.|cj""'^ I (10) 

Knowing j^"-', in order to resume the pursuit at the next step, we saw that we need to 
compute the projection of the signal on the elements of the dictionary. Let's therefore 
set initially x^^^ = x and x'"' the successive residuals. In this greedy approach, we 
consider that the decision corresponding to the MAP criteria at step n is correct and 
that we may therefore update the residual and the corresponding activities cj" =< 
^(n-i)^ ^ > by subtracting to x'"^^^ its projection on the winning element of index 
(seeEq.© : 

= x("-i)-d;7'^^ (11) 

Furthermore, we don't need to feed this information back to the signal and we may 
directly compute the activity again for all vectors thanks to the linearity of the scalar 
product operator: 



^(n) _ r^(n-l) ^(«-l) ^ Aj Aj 



C)"> = cy'-" -C^l-'\<^, -^> (12) 



In this simplified framework, the choice of the best match and the update rule are 
independent of the choice of the norm Nj of the filters (see Eq. [TO] and fTZt . so that 
we may indifferently use in the following normalized filters (that is Nj = 1 for all 
neurons) so as to simplify the equations. It comes thus: 



(Initialization) 



Cf =<x,A, > 



(13) 



'For any realization of the images, individual sources have independent activities, that is that removing 
one source, one gets a new image (conform with the LGM model) and one does not change the probability 
distribution of the other sources. 
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This activities' update (Eq. [T2] l corresponds in neuro-physiological terminology to a 
lateral interaction. It will be proportional to i?^ where i?^ j(„) =< Aj,Aj^,^) > 
is the correlation of any element j with the winning element j^") and relates to the 
reproducing kernel in wavelet theory. 

Finally, we achieve the recursive greedy pursuit of best components as the iteration of 
respectively a "matching" and a "pursuit" step. While the residual energy is greater 
than a fixed threshold Hx*^"^ || > e, we compute : 



(Matching) 
(Pursuit) 



= ArgMaXj|cj"~^^| 



(14) 
(15) 



The greedy pursuit therefore transforms an incoming signal x in a list of ranked sources 
finally (from Eq.fTTl) the signal may be reconstructed as 



'^k=l...r. 



s('=).A^.(.) +x(") 



which is an approximation of the goal set in inverting Eq.[T]if the norm of the residual 
signal x^") converges to zero. 

2.1.3 Properties of the greedy pursuit 

This algorithm is exactly equivalent to Matching Pursuit f]3\. This algorithm is famil- 
iar in signal processing and is increasingly used for image and video processing \9,^. 
However, the use of the statistics of natural images statistically optimizes the coding 
efficiency by modifying the image space metric |21]. Moreover, the Bayesian inference 
framework allows to precisely tune the heuristic approach of the Matching Pursuit. It 
allows for instance to set a different prior or to include knowledge of the measurement 
noise that is adapted to the goal of the system (and hence a different matching criteria 
that may depend on the Nj). This algorithm presents similar computational complexity 
and properties il4l pp.4 12-9]. In particular 

C'^,=C'^''-C^'^^0 (16) 

and as a consequence the activity of a winning neuron is totally canceled. 
Moreover, although filters in the dictionary are here generally not orthogonal, the resid- 
ual image is orthogonal to the winning filter and 

||x(")f = ||x("-i)f -|s(")p.||A^,„,f (17) 

so that we may easily compute the Squared Error (SE) of the residual signal at every 
step of the coding. 

= Se("-i' - |s(")|2.||A^.(„,f (18) 
Se(") = ||xf-V ^ \s^'^\'.\\A^i.A\' 

It first implies that the stopping criteria may be computed using this computation with- 
out computing ||x(") ||. A further consequence of the monotonous decrease of the SE 
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from Eq. [T9]is — under the condition that the dictionary is at least complete — the con- 
vergence of the reconstruction lfl4l p.414]. Under this condition, the algorithm will 
therefore stop in finite time. 

Though simple, the greedy pursuit is a complex non-linear algorithm. In fact, the study 
of its behavior is non trivial and may involve chaotic dynamics fs^. In particular, it 
is obvious that the choice that is made at a giving step may influence all future steps. 
This implies that a failed match may propagate wrong information to following steps 
and therefore that the probability of a failure grows higher as the rank increases. These 
properties are discussed in t21i] and in particular we illustrated that the speed of con- 
vergence increases as the dictionary becomes more over-complete so that it provides 
an efficient representation for natural scenes in image processing tasks. 



2.2 Implementation using Integrate-and-Fire (IF) neurons 

From our knowledge of neural mechanisms in a neuronal layer, the model of greedy 
feature pursuit that we derived from an event-based computation in a parallel archi- 
tecture is particularly adapted to a model of neural computations. We will derive an 
implementation using a network of spiking neurons based on the same feed-forward 
architecture of the perceptron (see Fig. |2]l but implementing the greedy pursuit using 
lateral interactions. 

The activity is represented by a driving current that drives the potential Vj of Integrate- 



and-Fire neurons 01211 . For illustration purposes, the dynamics of the neurons will here 
be modeled by a simple linear integration of the driving current Cj (other integration 
schemes lead to similar formulations): 

r.jV,=p,.C, (20) 

The neurons are duplicated with opposite polarity pj = ±1 so that Cj — pj.\Cj \ to 
model the ON / OFF symmetry of simple cells i23n . The neuron will generate a spike 
when the potential reaches an arbitrary threshold that we set here to 1. 

To implement the computation of the match of an input with stored patterns, we 
define a dictionary which will be implemented by weight vectors Aj . These vectors are 
normalized as described above and the input is decorrelated. The linear feed-forward 
perceptron integrates synaptically the input x into an initial activity Cj such that 

Cj =< X, > (21) 

The scalar projection will therefore drive the potential of the neuron. We may predict 
from the monotonous integration that the first neuron to generate a spike will be the 
one that corresponds to the maximal rectified scalar projection of the input signal with 
the weight vectors of the network, that is 

i* =ArgMax^.|Q| (22) 

the firing time is t* — -0-^ and the potential is then Vj — ^-Cj — j^^- This is 
therefore a simple and biologically plausible implementation of a MAP estimate using 
the parallel architecture of the network which is in contrast with the complexity of this 
implementation on a single-processor computer. To implement the greedy algorithm, 
we then need to implement a lateral interaction on the neighboring neuron similar to the 
observed lateral propagation of information in VI lflol l2ll. In our scheme the interaction 



10 




Figure 2: Model of a neuronal layer as a communication channel. To understand 
the content of neural activity, we consider here that the neuronal layer implements the 
inverse of a forward model (that is the analysis in Fig. [T]i. The architecture is similar 
to the perceptron: the input (noted Xi) is matched with normalized weight patterns 
Aji (which are fixed in this paper) so as to provide an integrative activation value (the 
membrane potential) which in turn is non-linearly transformed to achieve a membrane 
potential which grows proportionally to the probability of matching a feature. Spikes 
represent decisions that are fed back on the correlated neighboring neurons using lateral 
interactions (that we represented for the first spiking neuron) but also on the axonal 
output which yield a spiking output Sj . 



should yield the same configuration in the network (activity and potential) as if the 
source that was detected was originally absent from the signal. In this model, if j* is 
the winning neuron, the activity should be subtracted by \Cj*\. R{jj • } (see Eq. [TSl l and 
the potential by this value integrated over t*. The lateral interaction is thus achieved 
by updating after each spike the activity of the neighboring neurons proportionally to 
their cross-correlation R{j.j*} with the corresponding winning neuron j* : 

Q-Q-IQ-l-^fe-} (23) 

and removing the potential that would be generated by the activity of the removed 
source: 

t* 

^ - — -IQ-I-Rbj*} 

that is simply 

This lateral interaction is here immediate and behaves as a refractory period on the 
winning neuron (Cj. ^ and Vj. ^ 0) and a graded inhibition on positively cor- 
related neurons. It involves a subtractive hyper-polarizing term on the potential and 
on the activity. Biologically, it is improbable that the lateral interaction could be in- 
stantaneous, but this lateral interaction could be implemented in a fast manner using 
a shunting lateral interaction [5] mediated by fast-spike inter-neurons. Finally, this 
simple implementation therefore implements the Matching Pursuit algorithm that we 
defined in Eq . [T4l and [TS] and we will apply it to simple visual tasks. 



3 Results: efficiency of Sparse Spike Coding 
3.1 Coding natural image patches 

We compared the method we described in this paper with similar techniques used to 
yield sparse and efficient codes such as the conjugate gradient method used by Ol- 
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Figure 3: Efficiency of the matching pursuit compared to conjugate gradient. We 

compared here the matching pursuit ('mp') method with the classical conjugate gradi- 
ent function ('cgf') method as is used in [17]. We present the results for the coding of 
a set of image patches drawn from a database of natural images. These results were 
obtained with the same fixed dictionary of edges for both methods. We plot the mean 
final residual error for two definitions of sparseness : (Left) the mean absolute sum of 
the coefficients and (Right) the number of active (or non-zero) coefficients (the coding 
step for MP). For this architecture, the sparse spike coding scheme appears to be more 
efficient to code natural image patches. 



shausen and Field lfl6ll . We used a similar context and architecture as these experi- 
ments and used in particular the database of inputs and the dictionary of filters learned 
in the Sparsenet algorithm. Namely, we used a set of 10^ 10 x 10 patches (so that 
M ~ 100) from whitened images drawn from a database of natural images. The weight 
matrix was computed using the SPARSENET algorithm with a 2-fold over-completeness 
(N = 200) that show similar structure as the receptive of simple cells in VI |23]. From 
the relation between the likelihood of having recovered the signal and the squared error 
in the new metric, the mean squared reconstruction error (L2-norm) is an appropriate 
measure of the coding efficiency for these whitened images. This measure represents 
the mean accuracy (in terms of the logarithm of a probability) between the data and the 
representation. We compared here this measure for different definitions and values for 
the "sparseness". 

First, by changing an internal parameter tuning the compromise between reconstruction 
error and sparsity (namely the estimated variance of the noise for the conjugate gradi- 
ent method and the stopping criteria in the pursuit), one could yield different mean 
residual error with different mean absolute value of the coefficients (see Fig. [3] left) or 
LI -norm. In a second experiment, we compared the efficiency of the greedy pursuit 
while varying the number of active coefficients (the LO-norm), that is the rank of the 
pursuit. To compare this method with the conjugate gradient, a first pass of the latter 
method was assigning for a fixed number of active coefficients the best neurons while a 
second pass optimized the coefficients for this set of "active" vectors (see Fig. [3] right). 
Computationally, the complexity of the algorithms and the time required by both meth- 
ods was similar. However, the pursuit is by construction more adapted to provide a 
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Figure 4: Implementation of the greedy pursuit using Integrate-and-Fire Neurons. 

We simulated here the activity of a network of Integrate-and-Fire neurons tuned to form 
a simple model of an hyper-column in the primary visual area (VI) to the presentation 
of a horizontal edge at t ~ 0. We show in this image the output spiking activity of 
16 neurons tuned for different orientations for the feed-forward (black bars) and the 
sparse spike coding (white bars) models during the first 150 ms. In this latter model, 
the correlation linked to the information already detected is propagated as a hyper- 
polarizing and shunting lateral interaction to the neighboring neurons : the response in 
both latency and spiking frequency to the oriented edge is clearly more selective. 



progressive and dynamical result while the conjugate gradient method had to be re- 
computed for every set of parameter Best results are those giving a lower error for a 
given sparsity or a lower sparseness (better compression) for the same error. In both 
cases, the Sparse Spike Coding provides a coding paradigm which is of better efficiency 
as the conjugate gradient. 

3.2 Model of a hyper-column in the primary visual area 

To illustrate the properties of the algorithm, 1 modeled a network of linear Integrate- 
and-Fire neurons forming a simple model of an hyper-column in the primary visual 
area (VI). This model consist of an isolated network of 16 neurons selective to dif- 
ferent orientations of contours and which are modeled as Gabor filters (which are here 
symmetric with circular envelopes). We compared a pure feed-forward model to a net- 
work implementing the lateral interactions that we described above (see Eq. |23] and 
l24l l. We show here the resulting spiking activity when one of the preferred stimuli (the 
horizontal edge) was continuously presented from time t ~ Q (see Fig.|4]i. 
We observe that the neuron corresponding to that preferred stimulus fires with the short- 
est latency but also produces the highest spike rate. Moreover, the activity of the neu- 
rons corresponding to non-preferred directions shows a lower spiking activity when 
implementing the greedy pursuit. This dynamic reflects the lateral interaction (here an 
inhibition to the positively correlated neurons) generated at every spike which is ob- 
served in VI 1 7]. In fact, compared to the linear model, the latency and the frequency 
of the neighboring neurons show a sharper response for neighboring edge orientations 
(see Fig.|5]l which corresponds to the high selectivity observed in simple cells from VI 
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Figure 5: Selectivity response of the network to orientation. Output spike firing 
rate to the presentation of a horizontal edge at time t = 0. for the linear feed-forward 
model (plain line), the sparse spike coding scheme (filled curve) and with divisive nor- 
malization (dashed line) for different orientations of the input stimulus. The narrower 
tuning curve for the latter two methods represents a more selective response to the fea- 
tures learned in synaptic weights and mimics the behavior of the neural response in the 
primary visual area. 



11241] . The selectivity of this model was compared with the model of divisive normaliza- 
tion fl^, suggesting that this simple implementation of Integrate-and-Fire neurons — 
linked by lateral interactions and removing dynamically the redundancy in the signal — 
could provide a model for the complex processing occurring in cortical areas. 



Conclusion 

We presented here a model for neural processing which provides an alternative to the 
feed-forward and spike-rate coding approaches. Focusing on the parallel architecture 
of cortical areas, we based our computations on spiking events. Defining the function 
of sensory areas as matching the input to a model with unknown parameters, the activ- 
ity of the network represented a probabilistic evaluation of the accuracy of the match. 
From this representation, we inferred the best match using the Bayes rule and an in- 
ference decision criterion. We then derived an algorithm which may be implemented 
using lateral interactions : it removes for every spike the corresponding activity to 
correlated neurons. Simulations of this model compare to the non-linear behavior of 
neurons in biological network such as the primary visual cortex (VI). 
This model is based on the Matching Pursuit algorithm and provides a general frame- 
work for modeling the complex behavior of networks of spiking neurons. Particularly, 
it can be extended to multi-layered networks and provides an efficient code for natu- 



ral images as we described elsewhere 112111 . Further studies provided a learning scheme 
based on an Hebbian learning rule which yields an unsupervised learning of the sources 
as independent components of the signal to describe ll20ll . The model thus provides an 
algorithm of Sparse Spike Coding which is particularly efficient for visual tasks. 
This simple strategy thus suggest that the inherent complexity of the neural activity 
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is perhaps not simply the reflection of the computational details of neurons but may 

rather be the consequence of the parallel event-based dynamics of the neural activity. 
Although our model is a simplistic caricature compared to the behavior of biological 
neurons, it provides a simple algorithm which is compatible with some complex char- 
acteristic of the response of neuronal populations. It thus proposes a challenge for 
discovering the mechanisms underlying the efficiency of nervous systems by focusing 
on large-scale networks of spiking neurons. 

Reproducible research 

Scripts reproducing all figures may be obtained from the author upon request. 
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